We introduce a machine-learning (ML)-based weather simulator--called "GraphCast"--which outperforms the most accurate deterministic operational medium-range weather forecasting system in the world, as well as all previous ML baselines. GraphCast is an autoregressive model, based on graph neural networks and a novel high-resolution multi-scale mesh representation, which we trained on historical weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF)'s ERA5 reanalysis archive. It can make 10-day forecasts, at 6-hour time intervals, of five surface variables and six atmospheric variables, each at 37 vertical pressure levels, on a 0.25-degree latitude-longitude grid, which corresponds to roughly 25 x 25 kilometer resolution at the equator. Our results show GraphCast is more accurate than ECMWF's deterministic operational forecasting system, HRES, on 90.0% of the 2760 variable and lead time combinations we evaluated. GraphCast also outperforms the most accurate previous ML-based weather forecasting model on 99.2% of the 252 targets it reported. GraphCast can generate a 10-day forecast (35 gigabytes of data) in under 60 seconds on Cloud TPU v4 hardware. Unlike traditional forecasting methods, ML-based forecasting scales well with data: by training on bigger, higher quality, and more recent data, the skill of the forecasts can improve. Together these results represent a key step forward in complementing and improving weather modeling with ML, open new opportunities for fast, accurate forecasting, and help realize the promise of ML-based simulation in the physical sciences.
translated by 谷歌翻译
在本报告中,我们提出了用于声学场景分类(ASC)的低复杂性深度学习框架。所提出的框架可以分为四个主要步骤:前端频谱提取,在线数据增强,后端分类以及预测概率的晚融合。特别是,我们最初将音频记录转换为MEL,Gammatone和CQT频谱图。接下来,随机裁剪,分类和混合的数据增强方法将应用于生成增强频谱图,然后再添加到基于深度学习的分类器中。最后,为了达到最佳性能,我们融合了从三个单独的分类器获得的概率,这些分类器通过三种类型的频谱图独立训练。我们在DCASE 2022任务1开发数据集上进行的实验已经满足了低复杂性的要求,并达到了60.1%的最佳分类准确性,将Dcase基线提高了17.2%。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
相称性是一个有吸引力的公平概念,已应用于一系列问题,包括设施位置问题,这是社交选择中的经典问题。在我们的工作中,我们提出了一个称为强比例的概念,该概念可确保当不同位置有两组代理时,两组都会产生相同的总成本。我们表明,尽管强度比例是一个充分动机且基本的公理,但没有确定性的策略性防护机制来满足该财产。然后,我们确定一种称为随机排名的随机机制(该机制均匀地选择了$ k $在$ 1 $到$ n $之间的数字$ k $,并在$ k $'的第一个最高代理位置定位该设施),可以满足预期的强烈比例。我们的主要定理将随机列表描述为实现普遍真实,普遍匿名性和强烈比例的独特机制,在所有随机机制之间的期望中。最后,我们通过平均范围的机制证明,可以通过削弱预期的普遍真实性来实现更强大的前柱公平保证。
translated by 谷歌翻译
本文介绍了视听场景分类(SC)的任务,其中输入视频被分类为五个现实生活中拥挤的场景中的一个:'骚乱','噪音 - 街道','Firework-event','Music-event'和“运动氛围”。为此,我们首先从YouTube(野外场景中)收集这五个拥挤的上下文的音频视觉数据集(视频)。然后,建议广泛的深度学习框架独立地部署音频或视觉输入数据。最后,从高级深度学习框架获得的结果融合以实现最佳的准确度分数。我们的实验结果表明,音频和视觉输入因素独立贡献了SC任务的性能。值得注意的是,深入学习框架的集合探索音频或视觉输入数据的最佳精度为95.7%。
translated by 谷歌翻译
Gibbard-Satterthwaite定理表明,没有一致和非独裁的投票规则是战略的。我们重新审视投票规​​则,并考虑较弱的战略防护概念,不明显的可操纵性是由特罗桑和莫里尔提出的(2020年)提出的。我们确定了满足此概念的几种投票规则。我们还表明,包括K批准的若干投票规则未能满足此属性。我们在其特征在于,表征明显可操纵的条件。我们的见解之一是,当与选民数量相比,某些规则显然是可操纵的。与Gibbard-Satterthwaite定理相比,我们检查的许多规则并不明显可操纵。这反映了对概念的相对容易的可靠性以及与不明显的可操纵性的零信息假设相反,而不是完美的策略预防。我们还提出了计算明显的操纵和实验报告的算法结果。
translated by 谷歌翻译
我们专注于简单,一维的集体决策问题(通常被称为设施位置问题),并探索战略防护和比例公平的问题。我们为满足战略防护和不同程度的比例公平程度的机制提出了几种特征结果。我们还将其中一个机制描述为满足自然公平性和单调性性质的任何机制的独特均衡结果。最后,我们确定了战略和按比例公平机制,提供了满足相应公平公理的所有机制中的最佳福利最佳逼近。
translated by 谷歌翻译
生态瞬间评估(EMAS)是用于测量移动卫生(MHECHEATH)研究和治疗方案的当前认知状态,影响,行为和环境因素的重要心理数据源。非反应,其中参与者未能响应EMA提示,是一个地方问题。准确预测非响应的能力可用于改善EMA交付和发展顺应性干预。事先工作已经探索了古典机器学习模型,以预测非反应。然而,正如越来越大的EMA数据集可用,有可能利用在其他领域有效的深度学习模型。最近,变压器模型在NLP和其他域中显示了最先进的性能。这项工作是第一个探索用于EMA数据分析的变压器的使用。我们在将变压器应用于EMA数据时解决了三个关键问题:1。输入表示,2.编码时间信息,3.预先培训提高下游预测任务性能的效用。变压器模型实现了0.77的非响应预测AUC,并且明显优于古典ML和基于LSTM的深度学习模型。我们将使我们的一个预测模型在研究界可自由地提供40k EMA样品的核查,以便于开发未来的基于变压器的EMA分析工作。
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
In this paper, we propose a novel framework dubbed peer learning to deal with the problem of biased scene graph generation (SGG). This framework uses predicate sampling and consensus voting (PSCV) to encourage different peers to learn from each other, improving model diversity and mitigating bias in SGG. To address the heavily long-tailed distribution of predicate classes, we propose to use predicate sampling to divide and conquer this issue. As a result, the model is less biased and makes more balanced predicate predictions. Specifically, one peer may not be sufficiently diverse to discriminate between different levels of predicate distributions. Therefore, we sample the data distribution based on frequency of predicates into sub-distributions, selecting head, body, and tail classes to combine and feed to different peers as complementary predicate knowledge during the training process. The complementary predicate knowledge of these peers is then ensembled utilizing a consensus voting strategy, which simulates a civilized voting process in our society that emphasizes the majority opinion and diminishes the minority opinion. This approach ensures that the learned representations of each peer are optimally adapted to the various data distributions. Extensive experiments on the Visual Genome dataset demonstrate that PSCV outperforms previous methods. We have established a new state-of-the-art (SOTA) on the SGCls task by achieving a mean of \textbf{31.6}.
translated by 谷歌翻译